Binary Neural Networks Algorithms, Architectures, and Applications (Baochang Zhang, Sheng Xu, Mingbao Lin etc.)

158

Applications in Computer Vision

FIGURE 6.3

Subﬁgure (a) and (b) illustrate the robustness of the Gaussian distribution and the bimodal

distribution. From left to right in each subﬁgure, we plot the distribution of the unbinarized

weights wi and the binarized weights b^wⁱ. The XNOR-Net’s drawback lies in subﬁgure (a).

If a disturbance γ is on the unbinarized weights by the discrete activation, there will be a

signiﬁcant disturbance on the binarized weight. The subﬁgure (b) shows the robustness of

the bimodal distribution when inﬂuenced by the same disturbance.

Expectation-Maximization (EM) [175] method to constrain the distribution of weights. As

shown in Fig. 6.3 (b), the model is robust to disturbances. Furthermore, we introduce a

learnable and adaptive scale factor for every 1-bit layer to enhance the feature representation

capacity of our binarized networks. Finally, we lead a powerful 1-bit network for point cloud

processing, which can reconstruct real-valued counterparts’ amplitude via a new learning-

based method.

6.3.1

Problem Formulation

We ﬁrst consider a general quantization problem for deep-accelerating pointwise operations

to calculate quantized or discrete weights. We design a quantization process by projecting

the full-precision (32-bit) variable x onto a set as

Q = {a1, a2, · · · , an} ,

(6.34)

where Q is a discrete set and n is the bit size of the set Q. For example, n is set as 2¹⁶when

performing 16-bit quantization.

Then, we deﬁne the projection of x ∈R onto the set Q as

PR→Q(x) =

⎧

⎪

⎨

⎪

⎩

a1,

x < ^a¹⁺^a²

· · ·

ai,

ai−1+ai

≤x < ^aⁱ⁺^aⁱ⁺¹

· · ·

an,

an−1+an

≤x

(6.35)

By projecting 32-bit wights and activations into low bit cases, the computation source

will be reduced to a great deal. For extreme cases, binarizing weights and activations of

neural networks decreases the storage and computation cost by 32× and 64×, respectively.

Considering the binarization process of BNNs, Eqs. 6.34 and 6.79 are relaxed into

PR→B(x) =

−1,

x < 0

+1,

0 ≤x ^{, s.t.}^B⁼^{−¹^,⁺¹^}^,

(6.36)